| Structuring Variable | Bag | Object Category |
| Data | Marble | Object Exemplar |
| Features | Color | Shape, Color, Texture, Size, etc. |
| Feature Values | Binary | Categorical |
An Introduction to HBMs and their Application to Category Learning”
Repeatedly draw from bags of black and white marbles with unknown proportion of black marbles:
\(\rightarrow\) High chance of next marbles also being black!
Goal
We want to build a Bayesian model that reverse-engineers the mind‘s reasoning about color distributions across bags.
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
Reparamerization as
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
Level 4 – Hyperparameters
\(\frac{\alpha}{\alpha + \beta} \sim \text{Unif}(0, 1)\)
\(\alpha + \beta \sim \text{Exp}(1)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
Level 4 – Hyperparameters
\(\frac{\alpha}{\alpha + \beta} \sim \text{Unif}(0, 1)\)
\(\alpha + \beta \sim \text{Exp}(1)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
Level 4 – Hyperparameters
\(\frac{\alpha}{\alpha + \beta} \sim \text{Unif}(0, 1)\)
\(\alpha + \beta \sim \text{Exp}(1)\)
Applying Bayes Formula to HBM
\[ \begin{gathered} \overbrace{P(\theta, \alpha, \beta ~ | ~ y)}^{\text{Posterior}} \propto \underbrace{P(\alpha, \beta)}_{\text{Hyperprior}} \overbrace{P(\theta ~ | ~ \alpha, \beta)}^{\text{Conditional Prior}} \underbrace{P(y ~ | ~ \theta, \alpha, \beta)}_{\text{Likelihood}} \end{gathered} \]
Posterior inference regarding \(\theta_i\) by integrating out \(\alpha\) and \(\beta\)
\[ \begin{align*} P(\theta_i ~ | ~ d_1, \dots, d_n) = \iint P(\theta_i ~ | ~ \alpha, \beta, d_i) P(\alpha, \beta ~ | ~ d_1, \dots, d_n) \,d\alpha \,d \beta \end{align*} \]
Numerical integration of equation approximated using Markov Chain Monte Carlo (MCMC) methods, e.g., Hamiltonian-Monte-Carlo (HMC) using STAN:
// Beta-Binomial Hierarchical Model in STAN
data {
int<lower=0> N; // Number of bags
array[N] int<lower=0> n; // Number of marbles drawn from each bag
array[N] int<lower=0> y; // Number of black marbles in each bag
}
parameters {
real<lower=0,upper=1> mu; // Hyperparameter: mean of the Beta distribution
real<lower=0> phi; // Hyperparameter: precision of the Beta distribution
array[N] real<lower=0, upper=1> theta; // Bag-specific proportion of black marbles
}
transformed parameters {
// Reparameterization of the Beta distribution
real<lower=0> alpha = mu * phi;
real<lower=0> beta = (1 - mu) * phi;
}
model {
mu ~ uniform(0, 1); // Hyperprior for µ
phi ~ exponential(1); // Hyperprior for ϕ
theta ~ beta(alpha, beta); // Conditional prior for θ
y ~ binomial(n, theta); // Likelihood
}Key Takeaway
Marble example demonstrated that HBMs nicely align with our intuition how structured data can be used to form strong overhypotheses.
Why does this matter?
This abstract knowledge is what enables rapid learning from sparse data and one-shot generalization.
A mother points to an unfamiliar object lying on the counter and tells her child that this is a pen.
Question
By which features do children generalize the concept of a pen and recognize future instances of a pen as a pen?
Shape Bias
The expectation that members of a category tend to be similar in shape, which is learned by the age of 24 months (Smith et al., 2002).
Marble Example
|
Shapes Example
|
|
|---|---|---|
| Structuring Variable | Bag | Object Category |
| Data | Marble | Object Exemplar |
| Features | Color | Shape, Color, Texture, Size, etc. |
| Feature Values | Binary | Categorical |
Copies of level 2-4. for each Feature Dimension (Color, Shape, Texture, Size)
Glassen & Nitsch (2016) Griffiths et al. (2024) Kemp et al. (2007)
1
|
2
|
3
|
4
|
|||||
|---|---|---|---|---|---|---|---|---|
| Category | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
| Shape | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
| Texture | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Color | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Size | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
'Dax'
|
Object 1
|
Object 2
|
Object 3
|
|
|---|---|---|---|---|
| Category | 5 | ? | ? | ? |
| Shape | 5 | 5 | 6 | 6 |
| Texture | 9 | 10 | 9 | 10 |
| Color | 9 | 10 | 10 | 9 |
| Size | 1 | 1 | 1 | 1 |
After training, children (and the model) encounter a new object with a novel noun “dax”.
Task: Which of the three candidates with unkown label categories is most likely to be a dax?
Data based on Smith et al. (2002)
Test